Thresholded Lasso for high dimensional variable selection and statistical estimation ∗

نویسنده

Shuheng Zhou

چکیده

Given n noisy samples with p dimensions, where n ≪ p, we show that the multi-step thresholding procedure based on the Lasso – we call it the Thresholded Lasso, can accurately estimate a sparse vector β ∈ R in a linear model Y = Xβ + ǫ, where Xn×p is a design matrix normalized to have column l2 norm √ n, and ǫ ∼ N(0, σ2In). We show that under the restricted eigenvalue (RE) condition (Bickel-Ritov-Tsybakov 09), it is possible to achieve the l2 loss within a logarithmic factor of the ideal mean square error one would achieve with an oracle while selecting a sufficiently sparse model – hence achieving sparse oracle inequalities; the oracle would supply perfect information about which coordinates are non-zero and which are above the noise level. In some sense, the Thresholded Lasso recovers the choices that would have been made by the l0 penalized least squares estimators, in that it selects a sufficiently sparse model without sacrificing the accuracy in estimating β and in predicting Xβ. We also show for the Gauss-Dantzig selector (Candès-Tao 07), if X obeys a uniform uncertainty principle and if the true parameter is sufficiently sparse, one will achieve the sparse oracle inequalities as above, while allowing at most s0 irrelevant variables in the model in the worst case, where s0 ≤ s is the smallest integer such that for λ = √ 2 log p/n, ∑p i=1 min(β i , λ σ) ≤ s0λ2σ2. Our simulation results on the Thresholded Lasso match our theoretical analysis excellently. Keyword. Linear regression, Lasso, Gauss-Dantzig Selector, l1 regularization, l0 penalty, multiple-step procedure, ideal model selection, oracle inequalities, restricted orthonormality, statistical estimation, thresholding, linear sparsity, random matrices

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Thresholded Lasso for High Dimensional Variable Selection

Given n noisy samples with p dimensions, where n " p, we show that the multi-step thresholding procedure based on the Lasso – we call it the Thresholded Lasso, can accurately estimate a sparse vector β ∈ R in a linear model Y = Xβ + ", where Xn×p is a design matrix normalized to have column #2-norm √ n, and " ∼ N(0,σIn). We show that under the restricted eigenvalue (RE) condition (BickelRitov-T...

متن کامل

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)

Abstract: We revisit the adaptive Lasso as well as the thresholded Lasso with refitting, in a high-dimensional linear model, and study prediction error, lq-error (q ∈ {1, 2}), and number of false positive selections. Our theoretical results for the two methods are, at a rather fine scale, comparable. The differences only show up in terms of the (minimal) restricted and sparse eigenvalues, favor...

متن کامل

Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models

In this paper, we introduce Adaptive Cluster Lasso(ACL) method for variable selection in high dimensional sparse regression models with strongly correlated variables. To handle correlated variables, the concept of clustering or grouping variables and then pursuing model fitting is widely accepted. When the dimension is very high, finding an appropriate group structure is as difficult as the ori...

متن کامل

6 Sure Independence Screening for Ultra - High Dimensional Feature Space ∗

High dimensionality is a growing feature in many areas of contemporary statistics. Variable selection is fundamental to high-dimensional statistical modeling. For problems of large or huge scale pn, computational cost and estimation accuracy are always two top concerns. In a seminal paper, Candes and Tao (2007) propose a minimum l1 estimator, the Dantzig selector, and show that it mimics the id...

متن کامل

Sure independence screening for ultrahigh dimensional feature space

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2010

Thresholded Lasso for high dimensional variable selection and statistical estimation ∗

نویسنده

چکیده

منابع مشابه

Thresholded Lasso for High Dimensional Variable Selection

The adaptive and the thresholded Lasso for potentially misspecified models (and a lower bound for the Lasso)

Efficient Clustering of Correlated Variables and Variable Selection in High-Dimensional Linear Models

6 Sure Independence Screening for Ultra - High Dimensional Feature Space ∗

Sure independence screening for ultrahigh dimensional feature space

عنوان ژورنال:

اشتراک گذاری